Introduction

Structured Query Language (SQL) is a powerful tool for managing and manipulating relational databases. It is essential for data scientists, analysts, and anyone working with large datasets. This chapter will explore the importance of SQL, its applications, and provide example code format to illustrate its utility.

Why SQL is Important

Data Management: SQL allows for efficient management of large volumes of data. It provides the means to create, read, update, and delete data in a relational database.
Data Retrieval: With SQL, you can perform complex queries to retrieve specific data from one or more tables, making it easier to analyze and draw insights.
Data Manipulation: SQL enables the manipulation of data through operations such as sorting, filtering, and aggregating. This is crucial for data cleaning and preprocessing.
Data Integration: SQL supports the integration of data from different sources, allowing for comprehensive data analysis.
Standardization: SQL is a standardized language used by most relational database management systems (RDBMS), making it a versatile and essential skill for professionals in the field.

Basic SQL Concepts

import sqlite3
import pandas as pd

Basic SQL Concepts

To illustrate the following SQL concepts, we will use the employees table with the following data:

Table: employees

id	first_name	last_name	department	salary
1	Alice	Smith	HR	60000
2	Bob	Johnson	IT	80000
3	Charlie	Lee	Sales	55000
4	David	Kim	HR	75000
5	Eva	Brown	IT	65000
6	Frank	Wilson	Sales	70000
7	Grace	Taylor	HR	62000
8	Henry	Anderson	IT	77000
9	Irene	Thomas	Sales	53000
10	Jack	White	HR	58000
11	Karen	Harris	IT	69000
12	Leo	Martin	Sales	50000
13	Mia	Jackson	HR	64000
14	Noah	Lee	IT	72000
15	Olivia	Perez	Sales	68000
16	Paul	Young	HR	61000
17	Quinn	King	IT	76000
18	Rachel	Scott	Sales	57000
19	Sam	Green	HR	63000
20	Tina	Adams	IT	81000

SELECT and FROM

The SELECT statement is used to fetch data from a database, and the FROM clause specifies the table.

-- Selecting all columns from a table
p = """

SELECT * 
FROM 
  employees;

"""

pd.read_sql_query(p, con)

id	first_name	last_name	department	salary
1	Alice	Smith	HR	60000
2	Bob	Johnson	IT	80000
3	Charlie	Lee	Sales	55000
4	David	Kim	HR	75000
5	Eva	Brown	IT	65000
6	Frank	Wilson	Sales	70000
7	Grace	Taylor	HR	62000
8	Henry	Anderson	IT	77000
9	Irene	Thomas	Sales	53000
10	Jack	White	HR	58000
11	Karen	Harris	IT	69000
12	Leo	Martin	Sales	50000
13	Mia	Jackson	HR	64000
14	Noah	Lee	IT	72000
15	Olivia	Perez	Sales	68000
16	Paul	Young	HR	61000
17	Quinn	King	IT	76000
18	Rachel	Scott	Sales	57000
19	Sam	Green	HR	63000
20	Tina	Adams	IT	81000

-- Selecting specific columns
p = """

SELECT 
  first_name, 
  last_name, 
  salary 
FROM 
  employees;

"""

pd.read_sql_query(p, con)

id	first_name	last_name	salary
1	Alice	Smith	60000
2	Bob	Johnson	80000
3	Charlie	Lee	55000
4	David	Kim	75000
5	Eva	Brown	65000
6	Frank	Wilson	70000
7	Grace	Taylor	62000
8	Henry	Anderson	77000
9	Irene	Thomas	53000
10	Jack	White	58000
11	Karen	Harris	69000
12	Leo	Martin	50000
13	Mia	Jackson	64000
14	Noah	Lee	72000
15	Olivia	Perez	68000
16	Paul	Young	61000
17	Quinn	King	76000
18	Rachel	Scott	57000
19	Sam	Green	63000
20	Tina	Adams	81000

SELECT EXCLUDE and RENAME

You can exclude columns using SELECT and rename them for clarity.

-- Selecting all but one column
p = """

SELECT * 
EXCLUDE 
  salary 
FROM 
  employees;

"""

pd.read_sql_query(p, con)

id	first_name	last_name	department
1	Alice	Smith	HR
2	Bob	Johnson	IT
3	Charlie	Lee	Sales
4	David	Kim	HR
5	Eva	Brown	IT
6	Frank	Wilson	Sales
7	Grace	Taylor	HR
8	Henry	Anderson	IT
9	Irene	Thomas	Sales
10	Jack	White	HR
11	Karen	Harris	IT
12	Leo	Martin	Sales
13	Mia	Jackson	HR
14	Noah	Lee	IT
15	Olivia	Perez	Sales
16	Paul	Young	HR
17	Quinn	King	IT
18	Rachel	Scott	Sales
19	Sam	Green	HR
20	Tina	Adams	IT

-- Renaming columns
p = """

SELECT 
  first_name AS fname, 
  last_name AS lname 
FROM 
  employees;

"""

pd.read_sql_query(p, con)

fname	lname
Alice	Smith
Bob	Johnson
Charlie	Lee
David	Kim
Eva	Brown
Frank	Wilson
Grace	Taylor
Henry	Anderson
Irene	Thomas
Jack	White
Karen	Harris
Leo	Martin
Mia	Jackson
Noah	Lee
Olivia	Perez
Paul	Young
Quinn	King
Rachel	Scott
Sam	Green
Tina	Adams

LIMIT and OFFSET

The LIMIT clause restricts the number of rows returned, and OFFSET skips rows before beginning to return rows.

-- Limiting the number of rows returned

p = """

SELECT * 
FROM 
  employees 
LIMIT 
  10;

"""

pd.read_sql_query(p, con)

id	first_name	last_name	department	salary
1	Alice	Smith	HR	60000
2	Bob	Johnson	IT	80000
3	Charlie	Lee	Sales	55000
4	David	Kim	HR	75000
5	Eva	Brown	IT	65000
6	Frank	Wilson	Sales	70000
7	Grace	Taylor	HR	62000
8	Henry	Anderson	IT	77000
9	Irene	Thomas	Sales	53000
10	Jack	White	HR	58000

-- Skipping rows
p = """

SELECT * 
FROM 
  employees 
LIMIT 10 
OFFSET 5;

"""

pd.read_sql_query(p, con)

id	first_name	last_name	department	salary
6	Frank	Wilson	Sales	70000
7	Grace	Taylor	HR	62000
8	Henry	Anderson	IT	77000
9	Irene	Thomas	Sales	53000
10	Jack	White	HR	58000
11	Karen	Harris	IT	69000
12	Leo	Martin	Sales	50000
13	Mia	Jackson	HR	64000
14	Noah	Lee	IT	72000
15	Olivia	Perez	Sales	68000

ORDER BY

The ORDER BY clause sorts the result set.

-- Sorting the result set by salary in ascending order
p = """

SELECT * 
FROM  
  employees 
ORDER BY 
  salary;

"""

pd.read_sql_query(p, con)

id	first_name	last_name	department	salary
12	Leo	Martin	Sales	50000
9	Irene	Thomas	Sales	53000
3	Charlie	Lee	Sales	55000
10	Jack	White	HR	58000
1	Alice	Smith	HR	60000
19	Sam	Green	HR	63000
7	Grace	Taylor	HR	62000
13	Mia	Jackson	HR	64000
5	Eva	Brown	IT	65000
11	Karen	Harris	IT	69000
18	Rachel	Scott	Sales	57000
15	Olivia	Perez	Sales	68000
6	Frank	Wilson	Sales	70000
14	Noah	Lee	IT	72000
17	Quinn	King	IT	76000
8	Henry	Anderson	IT	77000
4	David	Kim	HR	75000
2	Bob	Johnson	IT	80000
20	Tina	Adams	IT	81000
16	Paul	Young	HR	61000

-- Sorting in descending order
p = """

SELECT * 
FROM 
  employees 
ORDER BY 
  salary DESC;

"""

pd.read_sql_query(p, con)

id	first_name	last_name	department	salary
20	Tina	Adams	IT	81000
2	Bob	Johnson	IT	80000
8	Henry	Anderson	IT	77000
17	Quinn	King	IT	76000
4	David	Kim	HR	75000
14	Noah	Lee	IT	72000
6	Frank	Wilson	Sales	70000
15	Olivia	Perez	Sales	68000
11	Karen	Harris	IT	69000
5	Eva	Brown	IT	65000
13	Mia	Jackson	HR	64000
19	Sam	Green	HR	63000
7	Grace	Taylor	HR	62000
16	Paul	Young	HR	61000
1	Alice	Smith	HR	60000
10	Jack	White	HR	58000
3	Charlie	Lee	Sales	55000
9	Irene	Thomas	Sales	53000
18	Rachel	Scott	Sales	57000
12	Leo	Martin	Sales	50000

AND, OR, NOT

Logical operators filter records based on multiple conditions.

-- Using AND, OR, NOT operators
p = """

SELECT * 
FROM 
  employees 
WHERE 
  department = 'Sales' AND salary > 50000;

"""

pd.read_sql_query(p, con)

id	first_name	last_name	department	salary
6	Frank	Wilson	Sales	70000
15	Olivia	Perez	Sales	68000
18	Rachel	Scott	Sales	57000

Numeric Operations

Perform arithmetic operations in SQL.

-- Calculating a new column
p = """

SELECT 
  first_name, 
  last_name, 
  salary, 
  salary * 1.1 AS new_salary 
FROM 
  employees;
"""

pd.read_sql_query(p, con)

first_name	last_name	salary	new_salary
Alice	Smith	60000	66000.0
Bob	Johnson	80000	88000.0
Charlie	Lee	55000	60500.0
David	Kim	75000	82500.0
Eva	Brown	65000	71500.0
Frank	Wilson	70000	77000.0
Grace	Taylor	62000	68200.0
Henry	Anderson	77000	84700.0
Irene	Thomas	53000	58300.0
Jack	White	58000	63800.0
Karen	Harris	69000	75900.0
Leo	Martin	50000	55000.0
Mia	Jackson	64000	70400.0
Noah	Lee	72000	79200.0
Olivia	Perez	68000	74800.0
Paul	Young	61000	67100.0
Quinn	King	76000	83600.0
Rachel	Scott	57000	62700.0
Sam	Green	63000	69300.0
Tina	Adams	81000	89100.0

LIKE and NOT LIKE

Pattern matching using LIKE.

-- Pattern matching
p = """

SELECT * 
FROM 
  employees 
WHERE 
  last_name 
LIKE 'S%';

"""

pd.read_sql_query(p, con)

id	first_name	last_name	department	salary
1	Alice	Smith	HR	60000
18	Rachel	Scott	Sales	57000

BETWEEN

Range filtering using BETWEEN.

-- Filtering within a range
p = """

SELECT * 
FROM 
  employees 
WHERE 
  salary BETWEEN 40000 AND 60000;

"""

pd.read_sql_query(p, con)

id	first_name	last_name	department	salary
12	Leo	Martin	Sales	50000
3	Charlie	Lee	Sales	55000
9	Irene	Thomas	Sales	53000
10	Jack	White	HR	58000
1	Alice	Smith	HR	60000

OFFSET

Skip a specific number of rows before starting to return rows.

-- Skipping the first 5 rows
p = """

SELECT * 
FROM 
  employees 
OFFSET 5;

"""

pd.read_sql_query(p, con)

id	first_name	last_name	department	salary
6	Frank	Wilson	Sales	70000
7	Grace	Taylor	HR	62000
8	Henry	Anderson	IT	77000
9	Irene	Thomas	Sales	53000
10	Jack	White	HR	58000
11	Karen	Harris	IT	69000
12	Leo	Martin	Sales	50000
13	Mia	Jackson	HR	64000
14	Noah	Lee	IT	72000
15	Olivia	Perez	Sales	68000
16	Paul	Young	HR	61000
17	Quinn	King	IT	76000
18	Rachel	Scott	Sales	57000
19	Sam	Green	HR	63000
20	Tina	Adams	IT	81000

Intermediate SQL Concepts

Joins

Table: employees

id	first_name	last_name	department_id	salary
1	Alice	Smith	1	60000
2	Bob	Johnson	2	80000
3	Charlie	Lee	3	55000
4	David	Kim	1	75000
5	Eva	Brown	2	65000
6	Frank	Wilson	3	70000
7	Grace	Taylor	1	62000
8	Henry	Anderson	2	77000
9	Irene	Thomas	3	53000
10	Jack	White	1	58000
11	Karen	Harris	2	69000
12	Leo	Martin	3	50000
13	Mia	Jackson	1	64000
14	Noah	Lee	2	72000
15	Olivia	Perez	3	68000
16	Paul	Young	1	61000
17	Quinn	King	2	76000
18	Rachel	Scott	3	57000
19	Sam	Green	1	63000
20	Tina	Adams	2	81000

Table: departments

department_id	department_name
1	HR
2	IT
3	Sales

Combine rows from two or more tables based on a related column.

-- Inner join example
p = """

SELECT 
  employees.first_name, 
    employees.last_name, 
    departments.department_name
FROM 
  employees
INNER JOIN 
  departments ON employees.department_id = departments.department_id;

"""

pd.read_sql_query(p, con)

first_name	last_name	department_name
Alice	Smith	HR
David	Kim	HR
Grace	Taylor	HR
Jack	White	HR
Mia	Jackson	HR
Paul	Young	HR
Sam	Green	HR
Bob	Johnson	IT
Eva	Brown	IT
Henry	Anderson	IT
Karen	Harris	IT
Noah	Lee	IT
Quinn	King	IT
Tina	Adams	IT
Charlie	Lee	Sales
Frank	Wilson	Sales
Irene	Thomas	Sales
Leo	Martin	Sales
Olivia	Perez	Sales
Rachel	Scott	Sales

CAST

Convert data from one type to another.

-- Casting a column
p = """

SELECT 
  CAST(salary AS DECIMAL(10, 2)) 
FROM 
  employees;

"""

pd.read_sql_query(p, con)

salary
60000.00
80000.00
55000.00
75000.00
65000.00
70000.00
62000.00
77000.00
53000.00
58000.00
69000.00
50000.00
64000.00
72000.00
68000.00
61000.00
76000.00
57000.00
63000.00
81000.00

Aggregations

Perform calculations on a set of values.

-- Using aggregation functions
p = """

SELECT 
  department_id, 
  COUNT(employee_id) AS num_employees
FROM 
  employees
GROUP BY 
  department_id;

"""

pd.read_sql_query(p, con)

department_id	num_employees
1	7
2	7
3	6

GROUP BY and HAVING

Group rows that have the same values and filter groups.

-- Grouping rows and filtering groups
p = """
SELECT 
  department_id,
  COUNT(employee_id) AS num_employees
FROM 
  employees
GROUP BY 
  department_id
HAVING COUNT(employee_id) > 5;
"""

pd.read_sql_query(p, con)

department_id	num_employees
1	7
2	7
3	6

UNION, INTERSECT, MINUS

Combine result sets

Table: managers

manager_id	first_name	last_name
1	Michael	Brown
2	Sarah	Johnson
3	John	Lee

-- Union example
p = """
SELECT 
  first_name 
FROM 
  employees
UNION

SELECT 
  first_name
FROM
   managers;
"""

pd.read_sql_query(p, con)

first_name
Alice
Bob
Charlie
David
Eva
Frank
Grace
Henry
Irene
Jack
Karen
Leo
Mia
Noah
Olivia
Paul
Quinn
Rachel
Sam
Tina
Michael
Sarah
John

POSITION

Find the position of a substring.

-- Finding substring position
p = """
SELECT 
  POSITION('e' IN first_name) 
FROM 
  employees;
"""

pd.read_sql_query(p, con)

position
0
0
4
0
0
2
0
3
2
0
0
3
0
0
0
0
0
2
0
0

CASE

Conditional logic in `SQL.

-- Using CASE statements
p = """
SELECT 
  first_name, 
  last_name,
       CASE 
         WHEN salary > 60000 THEN 'High'
         WHEN salary BETWEEN 40000 AND 60000 THEN 'Medium'
         ELSE 'Low'
       END AS salary_category
FROM 
  employees;
"""

pd.read_sql_query(p, con)

first_name	last_name	salary_category
Alice	Smith	Medium
Bob	Johnson	High
Charlie	Lee	Medium
David	Kim	High
Eva	Brown	Medium
Frank	Wilson	High
Grace	Taylor	Medium
Henry	Anderson	High
Irene	Thomas	Medium
Jack	White	Medium
Karen	Harris	High
Leo	Martin	Medium
Mia	Jackson	Medium
Noah	Lee	High
Olivia	Perez	High
Paul	Young	Medium
Quinn	King	High
Rachel	Scott	Medium
Sam	Green	Medium
Tina	Adams	High

Joins

Introduction

It’s rare that a data analysis involves only a single data frame. Typically you have many data frames, and you must join them together to answer the questions that you’re interested in.

pandas has a really rich set of options for combining one or more data frames, with the two most important being concatenate and merge. Some of the examples in this chapter show you how to join a pair of data frames. Fortunately this is enough, since you can combine three data frames by combining two pairs.

# remove cell
import matplotlib_inline.backend_inline
import matplotlib.pyplot as plt

# Plot settings
plt.style.use("https://github.com/aeturrell/python4DS/raw/main/plot_style.txt")
matplotlib_inline.backend_inline.set_matplotlib_formats("svg")

Prerequisites

This chapter will use the pandas data analysis package.

Concatenate

If you have two or more data frames with the same index or the same columns, you can glue them together into a single data frame using pd.concat().

For the same columns, pass axis=0 to glue the index together; for the same index, pass axis=1 to glue the columns together. The concatenate function will typically be used on a list of data frames.

If you want to track where the original data came from in the final data frame, use the keys keyword.

Here’s an example using data on two different states’ populations that also makes uses of the keys option:

import pandas as pd
import urllib.request

base_url = "http://www.stata-press.com/data/r14/"
state_codes = ["ca", "il"]
end_url = "pop.dta"
headers = {'User-Agent': 'Mozilla/5.0'}

def fetch_data(url):
    req = urllib.request.Request(url, headers=headers)
    with urllib.request.urlopen(req) as response:
        return pd.read_stata(response)

# This grabs the two data frames, one for each state
list_of_state_dfs = [fetch_data(base_url + state + end_url) for state in state_codes]

# Show example of first entry in list of data frames
print(list_of_state_dfs[0])

        county      pop
0  Los Angeles  9878554
1       Orange  2997033
2      Ventura   798364

# Concatenate the list of data frames
df = pd.concat(list_of_state_dfs, keys=state_codes, axis=0)

Note that the keys argument is optional, but is useful for keeping track of origin data frames within the merged data frame.

Exercise

Concatenate the follow two data frames:

df1 = pd.DataFrame([['a', 1], ['b', 2]],
                   columns=['letter', 'number'])

df2 = pd.DataFrame([['c', 3], ['d', 4]],
                   columns=['letter', 'number'])

Merge

There are so many options for merging data frames using pd.merge(left, right, on=..., how=... that we won’t be able to cover them all here. The most important features are: the two data frames to be merged, what variables (aka keys) to merge on (and these can be indexes) via on=, and how to do the merge (eg left, right, outer, inner) via how=. This diagram shows an example of a merge using keys from the left-hand data frame:

The how= keyword works in the following ways: - how='left' uses keys from the left data frame only to merge. - how='right' uses keys from the right data frame only to merge. - how='inner' uses keys that appear in both data frames to merge. - how='outer' uses the cartesian product of keys in both data frames to merge on.

Let’s see examples of some of these:

left = pd.DataFrame(
    {
        "key1": ["K0", "K0", "K1", "K2"],
        "key2": ["K0", "K1", "K0", "K1"],
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
    }
)
right = pd.DataFrame(
    {
        "key1": ["K0", "K1", "K1", "K2"],
        "key2": ["K0", "K0", "K0", "K0"],
        "C": ["C0", "C1", "C2", "C3"],
        "D": ["D0", "D1", "D2", "D3"],
    }
)
# Right merge
pd.merge(left, right, on=["key1", "key2"], how="right")

	key1	key2	A	B	C	D
0	K0	K0	A0	B0	C0	D0
1	K1	K0	A2	B2	C1	D1
2	K1	K0	A2	B2	C2	D2
3	K2	K0	NaN	NaN	C3	D3

Note that the key combination of K2 and K0 did not exist in the left-hand data frame, and so its entries in the final data frame are NaNs. But it does have entries because we chose the keys from the right-hand data frame.

What about an inner merge?

pd.merge(left, right, on=["key1", "key2"], how="inner")

	key1	key2	A	B	C	D
0	K0	K0	A0	B0	C0	D0
1	K1	K0	A2	B2	C1	D1
2	K1	K0	A2	B2	C2	D2

Now we see that the combination K2 and K0 are excluded because they didn’t exist in the overlap of keys in both data frames.

Finally, let’s take a look at an outer merge that comes with some extra info via the indicator keyword:

pd.merge(left, right, on=["key1", "key2"], how="outer", indicator=True)

	key1	key2	A	B	C	D	_merge
0	K0	K0	A0	B0	C0	D0	both
1	K0	K1	A1	B1	NaN	NaN	left_only
2	K1	K0	A2	B2	C1	D1	both
3	K1	K0	A2	B2	C2	D2	both
4	K2	K0	NaN	NaN	C3	D3	right_only
5	K2	K1	A3	B3	NaN	NaN	left_only

Now we can see that the products of all key combinations are here. The indicator=True option has caused an extra column to be added, called ’_merge’, that tells us which data frame the keys on that row came from.

Exercise

Merge the following two data frames using the left_on and right_on keyword arguments to specify a join on lkey and rkey respectively:

df1 = pd.DataFrame({'lkey': ['foo', 'bar', 'baz', 'foo'],
                    'value': [1, 2, 3, 5]})
df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
                    'value': [5, 6, 7, 8]})

Exercise

Merge the following two data frames on "a" using how="left" as a keyword argument:

df1 = pd.DataFrame({'a': ['foo', 'bar'], 'b': [1, 2]})
df2 = pd.DataFrame({'a': ['foo', 'baz'], 'c': [3, 4]})

What do you notice about the position .loc[1, "c"] in the merged data frame?

For more on the options for merging, see pandas’ comprehensive merging documentation.

Introduction to SQL Joins

SQL joins are used to combine rows from two or more tables based on a related column between them. Understanding joins is crucial for effective data retrieval and manipulation. This section will cover the four main types of joins: INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL JOIN, using visual aids and code snippets with a fake data frame.

Types of Joins

INNER JOIN

An INNER JOIN returns only the rows that have matching values in both tables.

-- Inner join example
p = """
SELECT 
  A.id, 
  A.name, 
  B.order_id
FROM 
  Customers A
INNER JOIN 
  Orders B ON A.id = B.customer_id;
"""
pd.read_sql_query(p, con)

LEFT JOIN

A LEFT JOIN returns all the rows from the left table and the matched rows from the right table. If no match is found, NULL values are returned for columns from the right table.

-- Left join example
p = """
SELECT 
  A.id, 
  A.name, 
  B.order_id
FROM 
  Customers A
LEFT JOIN 
  Orders B ON A.id = B.customer_id;
"""
pd.read_sql_query(p, con)

RIGHT JOIN

A RIGHT JOIN returns all the rows from the right table and the matched rows from the left table. If no match is found, NULL values are returned for columns from the left table.

p = """
SELECT 
  A.id, 
  A.name, 
  B.order_id
FROM 
  Customers A
RIGHT JOIN 
  Orders B ON A.id = B.customer_id;
"""
pd.read_sql_query(p, con)

FULL JOIN

A FULL JOIN returns all the rows when there is a match in either left or right table. Rows without a match in one of the tables will contain NULL values for columns from that table.

-- Full join example
p = """
SELECT 
  A.id, 
  A.name, 
  B.order_id
FROM 
  Customers A
FULL JOIN 
  Orders B ON A.id = B.customer_id;
"""
pd.read_sql_query(p, con)

Example Data Frames

To illustrate these joins, let’s consider two fake data frames: Customers and Orders.

import pandas as pd

# Creating a fake data frame for Customers
customers = pd.DataFrame({
    'id': [1, 2, 3, 4],
    'name': ['Alice', 'Bob', 'Charlie', 'David']
})

# Creating a fake data frame for Orders
orders = pd.DataFrame({
    'order_id': [101, 102, 103, 104],
    'customer_id': [1, 2, 2, 4]
})

# Display the data frames
print("Customers Data Frame")
print(customers)

Customers Data Frame
   id     name
0   1    Alice
1   2      Bob
2   3  Charlie
3   4    David

print("Orders Data Frame")
print(orders)

Orders Data Frame
   order_id  customer_id
0       101            1
1       102            2
2       103            2
3       104            4

Conclusion

SQL is an indispensable tool for anyone working with data. Its ability to manage, manipulate, and integrate data makes it essential for data analysis and decision-making. The examples provided in this chapter illustrate some of the key operations and techniques used in SQL, highlighting its importance in the field of data science.